Maximal Lattice Overlap in Example-Based Machine Translation

نویسندگان

  • Rebecca Hutchinson
  • Paul N. Bennett
  • Jaime G. Carbonell
  • Peter J. Jansen
  • Ralf D. Brown
  • Jaime Carbonell
  • Peter Jansen
  • Ralf Brown
چکیده

Example-Based Machine Translation (EBMT) retrieves pre-translated phrases from a sentence-aligned bilingual training corpus to translate new input sentences. EBMT uses long pre-translated phrases effectively but is subject to disfluencies at phrasal translation boundaries. We address this problem by introducing a novel method that exploits overlapping phrasal translations and the increased confidence in translation accuracy they imply. We specify an efficient algorithm for producing translations using overlap. Finally, our empirical analysis indicates that this approach produces higher quality translations than the standard method of EBMT in a peak-to-peak comparison. Email: [email protected], [email protected], [email protected], [email protected], [email protected] Work supported in part by the National Science Foundation under grant number IIS-9982226 (MLIAM: MUCHMORE: Multilingual Concept Hierarchies for Medical Information Organization and Retrieval) and award number IIS-9873009 (KDI: Universal Information Access: Translingual Retrieval, Summarization, Tracking, Detection and Validation).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What is Example-Based Machine Translation?

We maintain that the essential feature that characterizes a Machine Translation approach and sets it apart from other approaches is the kind of knowledge it uses. From this perspective, we argue that Example-Based Machine Translation is sometimes characterized in terms of inessential features. We show that Example-Based Machine Translation, as long as it is linguistically principled, significan...

متن کامل

Reducing Boundary Friction Using Translation-Fragment Overlap

Many corpus-based Machine Translation (MT) systems generate a number of partial translations which are then pieced together rather than immediately producing one overall translation. While this makes them more robust to ill-formed input, they are subject to disfluencies at phrasal translation boundaries even for well-formed input. We address this “boundary friction” problem by introducing a met...

متن کامل

Manipuri-English Example Based Machine Translation System

The development of a Manipuri to English example based machine translation system is reported. The sentence level parallel corpus is built from comparable news corpora. POS tagging, morphological analysis, NER and chunking are applied on the parallel corpus for phrase level alignment. The translation process initially looks for an exact match in the parallel example base and returns the retriev...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Approximate Sentence Retrieval for Scalable and Efficient Example-Based Machine Translation

Approximate sentence matching (ASM) is an important technique for tasks in machine translation (MT) such as example-based MT (EBMT) which influences the translation time and the quality of translation output. We investigate different approaches to find similar sentences in an example base and evaluate their efficiency (runtime), effectiveness, and the resulting quality of translation output. A ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003